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Abstract 

For supercritical multitype Markov branching processes in continuous time, 
we investigate the evolution of types along those lineages that survive up to 
some time t. We establish almost-sure convergence theorems for both time 
and population averages of ancestral types (conditioned on non-extinction), 
and identify the mutation process describing the type evolution along typical 
lineages. An important tool is a representation of the family tree in terms of 
a suitable size-biased tree with trunk. As a by-product, this representation 
allows a 'conceptual proof (in the sense of [19]) of the continuous-time version 
of the Kesten-Stigum theorem. 
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1. Introduction 

Looking at the time evolution of a population one has two possible perspectives: 
either forward or backward in time. In the first case one observes the characteristics of 
the population at a given time t and asks for its behaviour as t increases to infinity. A 
classical model that describes the unrestricted reproduction of independent individuals 
is the (multitype) branching process, and a principal result in the supercritical case 
is the Kesten-Stigum theorem [16], which describes the population size and relative 
frequencies of types; see Theorem 2.1 for the precise statement. A different situation 
arises if the population size is kept constant; this leads to certain interacting particle 
systems, like the Moran model and its relatives (for review, see [7]). By way of contrast, 
the backwards - or retrospective - aspect of the population concerns the lineages 
extending back into the past from the presently living individuals and asks for the 
characteristics of the ancestors along such lineages. One famous example is Kingman's 
coalescent (see [17, 18], and [22] for a review), the backward version of the Moran 
model. As was observed e.g. by Jagers [14] and Jagers and Nerman [15], it is also 
rewarding to study the backward aspects of multitype branching processes; this point 
of view has turned out as crucial in recent biological applications [11]. It is the aim 
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of this article to pursue this last line of research further. We do so in continuous 
time because this gives us the opportunity to transfer some powerful methods recently 
developed for discrete time. We also concentrate on the supercritical case. 

Specifically, we consider the individuals alive at some time t and investigate the 
types of their ancestors at an earlier time, t — u. We will show the following. 

• When t rcsp. t and u tend to infinity, both time average and population average 
of ancestral types converge to a particular distribution a almost surely on non- 
extinction (Theorems 3.1 and 3.2). 

This a will be called the ancestral distribution of types; its components are ai = 
TTihi, where tt and h are the (properly normalized) left and right Perron- Frobenius 
eigenvectors of the generator of the first-moment matrix. 

More detailed information about the evolution of types along ancestral lineages 
is obtained through what we would like to call the retrospective mutation chain, a 
particular continuous-time Markov chain on the type space with a as its invariant 
distribution. We will show: 

• For all individuals alive at time t up to an asymptotically negligible fraction, 

the time averaged empirical type evolution process tends in distribution to the 
stationary retrospective mutation chain, in the limit as t ^ oo, almost surely on 
non-extinction (Theorem 3.3). 

One basic ingredient of our reasoning is a law of large numbers for population 
averages; see Proposition 5.1. A second crucial ingredient is a representation of the 
family tree in terms of a size-biased tree with trunk (with the retrospective mutation 
chain running along the trunk); see Theorem 4.1. This representation is the continuous- 
time analogue of the size-biased tree representation introduced by Lyons, Pemantle 
and Peres [20] and Kurtz, Lyons, Pemantle and Peres [19]. In passing, it allows us to 
extend their conceptual proof of the Kesten-Stigum theorem to continuous time. The 
third ingredient is the Donsker-Varadhan large deviation principle for the retrospective 
mutation chain [5, 6]. This implies a large deviation principle for the typical type 
evolution along the surviving lineages in the tree - see Theorem 5.1. 

This paper is organized as follows. In the next section we recall the construction 
of the family tree for multitype branching processes in continuous time. Section 3 
contains the precise statement of results. Section 4 is devoted to the size-biased tree 
with trunk, and the proofs of the main results are collected in Section 5. 

2. The branching process and basic facts 

We consider a continuous-time multitype branching process as described in Athreya 

and Ney [2, Ch. V.7]. To fix the notation we recall the basic setting here. 

Let S" be a finite set of types. An individual of type i ^ S lives for an exponential 
time with parameter > 0, and then splits into a random offspring Ni = {Nij)j^s 
with distribution pi on and finite means niij := ^{Nij) for all i,j G S: here, Nij 
is the number of j-children, and Z_|_ = {0, 1, . . .}. We assume that the mean offspring 
matrix M = {mij)ij^s is irreducible. 

According to Harris [10, Ch. VI], the associated random family tree can be con- 
structed as follows. Let X = lJn>o^"' where X„ describes the virtual n'th generation. 
That is, Xo = S, and io S Xq specifies the type of the root, i.e., the founding ancestor. 
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Next, Xi = 5 X N, and the element x = (ii,^i) £ Xi is the £i'th ii-child of the root. 
FinaUy, for n > 1, X„ = S"' x N", and x = (ii, . . . . . . ,ln) G X„ is the -^n-th 

«„-child of its parent x = (ii, . . . ,i„_i;^i, . . . ,€„_i); see Fig. 1. We write <t{x) = in 
for the type of x G X„. With each a; G X we associate 

• its random hfe time t^^ distributed exponentially with parameter a^(^^-^, and 

• its random offspring Nx = {Nx,j)jes G ^+ with distribution Pa(^x) 

such that the family {tx, Nx '■ x G X} is independent. 

The random variables Nx indicate which of the virtual individuals x G X are actually 
realized, namely those in the random set X = lJ„>o Nn defined recursively by 

^0 = {«o}, Xn = {x = {x;in,£n) G X„ : X e X„_i, in < N^^i^}, 

where iq is the prescribed type of the root. The random variables Tx provide the 
proper time scale. Namely, for x € X , let the splitting times Tx be defined recursively 
by Tx =Tx + Tx with T^- := 0. The hfetime interval of a; G X is then [Tx,Tx[. Hence 
X(t) ^ {x £ X : Tx < t < Tx} is the population at time t. One may visualize the 
resulting tree by identifying each x E X with an edge from x to x with length Tx in 
the direction of time. 




X(s) 
Z(s) = (3,1,1) 



s t 

Figure 1: A realization of the branching process. Types are indicated by different line types, 
indexed in the order (black, grey, dashed), counted from top to bottom, and symbolized by 
filled circles. The set X{s) consists of all edges that intersect the vertical line at s; the set 
X{x,t) consists of all edges that emanate from x and hit the vertical line at t. Z(s) counts 
the type frequencies in the population X(s). 
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The family tree is completely determined by the process X[0, oo[ := 

which is a random clement of ft := D(\0, oo[,*P/(X)). the Skorohod space of all cadlag 
functions on [0, oo[ taking values in the (countable) set ^/(X) of all finite subsets of X. 
We write P' for the distribution of X[0, oo[ on f] when the type of the root is io = i, 
and E* for the associated expectation. If io is chosen randomly with distribution z/, we 
write P'' and E''. We will often identify X[0, oo[ with the canonical process on ^l. 

For < s < t and y € X{t) we write y{s) for its unique ancestor living at time s. 
On the other hand, for x G X{s) we let 

X{x,t) = {yGX:xyGX{t)} (2.1) 

denote the set of descendants of x living at time t; cf. Fig. 1. In the above, the 
concatenation xy of two strings x,y €Xis defined in the obvious way, and the empty 
string is considered as an ancestor of type (t(x); i.e., X{x,t) = {a{x)} as long as x G 
X{t). By the loss-of- memory property of the exponential distributions, the descendant 
trees X{x,[s,(x>[) = {X{x,t))t>s with x e X{s) are conditionally independent given 
X[0, s], with distribution P'^(^). We will also consider the counting measures 

^(*)= E ^-W' Z{x,t)= ^ (2.2) 
xex{t) yex{x,t) 

on S, where 6i is the Dirac measure at i. Z(t) and Z{x, t) count the type frequencies in 
the population X{t) resp. the subpopulation X{x,t) of x-descendants. In particular, 
Zj{t) is the cardinality of Xj{t) = {x € X{t) : a{x) = j}, the subpopulation of type 
j e S, and ||-Z(f)|| := Y^j^s ^ii^) — 1^(^)1 the total size of the population. 

It is well-known (cf. [2], p. 202, Eq. 9) that W{Zj{t)) = {e*'^)ij for all i,j G S, where 
the generator matrix A = {aij)ij^s is given by 

aij = ai{mij - 5ij) . (2.3) 

By the irreducibility of M, A is also irreducible, so that the first moment matrix 
{M^{Zj{t)))ij^s has positive entries for any t > 0. (This property is often called 
'positive regularity', see [2, p. 202].) Perron-Probenius theory then tells us that the 
matrix A has a principal eigenvalue A (i.e., a real eigenvalue exceeding the real parts 
of all other eigenvalues), and associated positive left and right eigenvectors tt and h 
which will be normalized s.t. (tt, 1) = 1 = (tt, h). Here we think of the row vector tt 
as a probability measure, of the column vectors h and 1 = (1, . . . , 1)^ as functions on 
S, and of the scalar product (tt, h) = TTj/ij as the associated expectation. We are 
mainly interested in the supercritical case A > 0. In this case we write 

:= {X{t) ^ for all t > 0} 

for the event that the population survives for all times. 

It is a remarkable fact that the almost-sure behaviour of the family tree is, to a 
large extent, already determined by the the global quantities A, tt, h. One prominent 
example is the following continuous-time version of the Kesten-Stigum theorem (see 
[16] for the discrete-time original, [1] for the continuous-time version, and [19] for the 
recent discrete-time conceptual proof). 
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Theorem 2.1. (Kesten-Stigum.) Consider the supercritical case A > 0. 

(a) For all i G S we have 

TT^TTT Y] ^aix) = t^Mt\ . — ' f'-almost surcly on n^urv 

(b) There is a nonnegative random variable W such that 

lim Z{t) e~^* = Wtt -almost surely for any i G S , 

t — >oo 

and P*(iy > 0) > for all i if and only if 

E{Nij log Nij) < 00 for all i, j e S. (2.4) 

In this case, {W > 0} = Ogurv T''^ -almost surely, and hi = E*(W^). 

For the sake of reference we provide here a full proof extending the conceptual 
discrete-time proof of [19] to our continuous-time setting. Assertion (a) reveals that 
the left eigenvector tt holds the asymptotic proportions of the types in the population, 
and statement (b) implies that 

A= lim -\og\X{t)\ 

t^oa t 

is the almost sure exponential growth rate of the population in the case of survival. In 
fact, this statement does not require condition (2.4); see the proof of Theorem 3.3 in 
Section 5.3. The ?'-th coordinate hi of the right eigenvector h measures the long-term 
fertility of an i-individual. In fact, hi is also characterized by the limiting relation 

E'{\X{t)\)e-^* hi ast^oo; (2.5) 

cf. Remark 4.1(a) below. 



3. Results 

We still consider the supercritical case A > 0. We are interested in the mutation 

behaviour of the population tree. More specifically, we ask for the behaviour of the 
sequence of types along a typical branch of this tree. It turns out that this behaviour 
is again completely determined by the global quantities A, tt, h. A key role is played 
by the probability vector a = {ai)ii=s with components at = TTihi. As observed by 
Jagers [14, Corollary 1], Jagers and Nerman [15, Prop. 1], and Hermisson et al. [11], 
this probability vector describes the distribution of ancestral types of an equilibrium 
population with type frequencies given by tt. The vector a will therefore be called the 
ancestral distribution. Our results below shed some additional light on the significance 
of a. 

To begin, we consider a typical individual x G X{t) alive at some large time t and 
ask for the type a{x{t—u)) of its ancestor x{t—u) living at some earlier time t—u. 
We find that a{x{t—u)) is asymptotically distributed according to a. Specifically, let 
Q <u <t and 

= ^ E (3-1) 
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be the empirical ancestral type distribution at time t—u taken over the population 
X{t). (Of course, this definition requires that X{t) ^ 0.) 

Theorem 3.1. (Population average of ancestral types.) Let A > and i G S. Then 
lim lim A^^t) = a W^-almost surely on f^surv (3.2) 

u— ^oo t— »oo 

The proof will be given in Section 5.2. We would like to remark that a slightly weaker 

result under slightly stronger conditions (convergence in probability under assumption 
(2.4)) follows immediately from Corollary 4 of Jagers and Nerman [15], where very 
general population averages are considered. 

Remark 3.1. Assertion (3.2) means that, for each j e S, the average 

' ^ xex{t) 

(with /{.} denoting the indicator function) converges to aj P'-almost surely on Ogurv 
as t ^ cx) and u ^ oo in this order. Letting s = t—u, we can rewrite this average in 
the form 

^ \X{x,t)\/ 

where X{x,t) is given by (2.1). The numbers |X(2:,t)| with x € Xj{s) are i.i.d. 
with mean E-'(|Ar(M)|). Assuming the vahdity of a law of large numbers and using 
Theorem 2.1(a) and Eq. (2.5), we can conclude that the average above converges to 
TTjhj/{'K, h) = aj as s,^ — > oo. This explains the particular structure of the ancestral 

distribution a. 

In our next theorem we ask for the time average of types along the line of descent 
leading to a typical x G X{t). This time average is given by the empirical distribution 

/, 



1 /•' 
t Jo 



of the process cr(a;[0,t]) = [a{x{s))) Note that 1/^^(4) belongs to the simplex 7^(5) 

of all probability vectors on S; 'P(S') will be equipped with the usual total variation 
distance || • ||. To describe the behaviour of L^{t) for a typical x S X{t) we have to 
step one level higher and to consider the empirical distribution of i^(t) taken over 
the population x E X(t). This empirical distribution belongs to 'P{V{S)), the set of 
probability measures on 'P(S'), which will be equipped with the weak topology. 

Theorem 3.2. (Time average of ancestral types.) Let A > and i £ S. Then 

lim '^L^(t) = ^^-almost surely on fisurv (3.3) 

' xex{t) 

Remark 3.2. (a) According to the portmanteau theorem [8, p. 108, Th. 3.1], state- 
ment (3.3) is equivalent to the assertion that 

lim — ^ V /{ L^(t) EF} = Q for each closed F C V{S) with a^F 

t—l-OO \X{t)\ 
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P*-almost surely on Ogurv, and it is sufficient to check this in the case when F = {u G 

P{S) : — all > e} with arbitrary e > 0. The theorem therefore asserts that, for 
all individuals x € X{t) up to an asymptotically negligible fraction, the ancestral type 
average L^{t) is close to a. 

(b) Theorem 3.2 involves a population average of time averages. So one may ask 
whether the averaging of population and time can be interchanged. It follows from 
Theorem 3.1 that this is indeed the case: 

1 f' 

hm - / Sa^h) du = 

t^oo t Jq 

almost surely on Ogurv 

Theorem 3.2 is in fact a corollary of our next theorem which considers the complete 
mutation history along a typical line of descent. To state this result we need some 
preparations. We introduce first the mutation process on S which will turn out to 
describe the time-averaged mutation behaviour along an ancestral line. 

Definition: The retrospective mutation chain is the Markov chain {a{t))t>o on S 
which stays in a state i e 6" for an exponential holding time with parameter a^+A and 
then jumps to j & S with probability 

" {l+X/ai)hi ■ 

That is, the generator G = {gij)ij^s of (a'(t))t>o is given by 

gij = {ai+X){pij - Sij) = h~'^{aij - Xdij)hj. 



We note that G is indeed a generator because OiJ^jes ''^v^j ~ 'l^jesi'^i^ij + 
aij)hj = {ai+\)hi by (2.3). Since M is irreducible by assumption, G is irreducible as 
well. It is also immediate that the ancestral distribution a is the (unique) stationary 
distribution of G. The retrospective mutation chain was idontifiod by Jagers [13, p. 195] 
and may be interpreted as the forward version of the backward Markov chain [15, 
Proposition 1] that results from picking individuals randomly from the stationary type 
distribution tt and following their lines of descent backward in time. This gives the 
transition rates 

9ij = T^jiaji - X6ij)n7'^ = a^gjia''^ , (3.4) 

which corresponds to the time reversal of the retrospective mutation chain. 

To set up the stage for Theorem 3.3 we let E = £)(]&, S) denote the space of all doubly 
infinite cadlag paths in 5. E will be equipped with the usual Skorohod topology which 
turns E into a Polish space; see e.g. [8], Section 3.5 and in particular Th. 5.6, for the 
case of the time interval [0, oo[. The associated Borel cr-algebra coincides with the 
cr-algebra generated by the evaluation maps E 9 cr — > a{t), f G M [8, p. 127, Prop. 7.1]. 
The time shift "ds on E is defined by 



■&s(j{t) = (T{t + s) , 



s,t gR, a gT.. 
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We write Vs C^) for the set of all probability measures on S which are invariant under 
the shift group 9 = {'&s)se«.- Endowed with the weak topology, Pe(S) is a Polish 
space [8, p. 101, Th. 1.7]. 

Next we introduce the time-averaged type evolution process of an individual in the 
population tree. For t > and x G X{t) we let (T{x)t,peT G S be defined by 

a{x)t,per{s) = (^{x{st)) , s S M , (3.5) 

where Sj is the unique number in [0,t[ with s = St mod t. That is, c^(^)t,per € S 
is the periodically continued type history of x up to time t. The time-averaged type 
evolution of x is then described by the empirical type evolution process 

R%t) = \f^ e Pe(S). (3.6) 

We are interested in the typical behaviour of R!" {t) when x is picked at random from 
X{t), the population at time t. This is captured in their empirical distribution, i.e., 
the population average 

rw-^ E (3-7) 

(As before, this definition requires that X{t) ^ 0.) r{t) is a random element of 
V{Vq{T,)), the set of all probability measures on the Polish space VsiT,), which is 
again equipped with the weak topology. In Section 5.3 we will prove: 

Theorem 3.3. (Typical ancestral type evolution.) Let A > and i G S. Then 

lim T(t) = Su, V^-almost surely on f2surv> (3-8) 

t— >oo 

where fi £ Vei^) is the distribution of the stationary (doubly infinite) retrospective 
mutation chain {a{t))teR with generator G and invariant distribution a. 

Remark 3.3. As in Remark 3.2(a), the portmanteau theorem implies that (3.8) is 
equivalent to the assertion that, P*-almost surely on figurv, ^{t){F) for every 
closed F C ■Pe(S) such that /j, ^ F. Writing d{-, •) for any metric metrizing the weak 
topology on Vq{T,) this in turn means that, for each £ > 0, 

P*-almost surely on Ogurv The theorem therefore states that, for all individuals x £ 
X{t) up to an asymptotically negligible fraction, the time-averaged ancestral type 
evolution process B.^ (t) is close to fi in the weak topology. Theorem 3.3 also highlights 
the restrospective nature of our mutation chain: it describes the evolution of types 
along those lines of descent which survive until time t (and thus can be seen when a 
time-t individual looks back into the past). 
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4. Size-biasing of the family tree 

In this section we construct a continuous-time version of the size-biased multitype 

Galton- Watson tree as introduced by Lyons, Pcmantlc, Peres, and Kurtz [20, 19]. 
Informally, this is a tree with a randomly selected trunk (or spine) along which time 
runs at a different rate and offspring is weighted according to its size; in particular, 
there is always at least one offspring along the trunk so that the trunk survives forever. 
The children off the trunk get ordinary (unbiased) descendant trees (the bushes). It 
will turn out that the trunk of the size-biased tree describes the evolution along a 
typical ancestral line that survives up to some fixed time. The construction is not 
confined to the supercritical case; that is, in this section A can have arbitrary sign. 
First of all, for each type i G S we introduce the size-biased offspring distribution 

P.(«) = ^^^^%^, «ezf, (4.1) 

where (k, ft.) = i^-jhj and q = 1 + A/a^ is a normalizing constant, will serve 
as the offspring distribution of an i-individual on the trunk; it is indeed a probability 
distribution since 

(k, h) Pi{k) = ^^niijhj = ^((5ij + aij/ai)hj = Ci hi 

by (2.3); note that is automatically positive. Next, when an i-individual on the trunk 

has offspring Ni = (Nij)j^s with distribution pj, one of these offspring is chosen as the 
successor on the trunk, where children are picked with probability proportional to hj 
when their type is j. That is, the successor is of type j with probability Nij hj/ {Ni, h) 
for a given offspring, and with probability 



\IN. h\J a hi 



{Ni,h)^ Cihi 

on average. These are preeisely the jump probabilities of the retrospeetive mutation 
chain. Finally, the lifetime of an i-individual on the trunk will be exponential with 
parameter Ui+X, which coincides again with the holding time parameter of the retro- 
spective mutation chain. A corresponding embedded chain combined with size-biased 
waiting times also occurs when more general non-Markovian populations (i.e., with 
waiting times deviating from the exponential distribution) are traced backwards, see 
[15, Proposition 1]. 

We now construct the size-biased tree in detail. Let {tx,Nx : a; £ X} be as in 
Section 2 and, independently of this, a sequence {TVi,iV„,^„ : n > 0} of random 
variables with values in ]0, oo[ , Z^, X respectively such that, for a given type io = i of 

the root, = ^ smd 

• To, A^o are independent, tq has exponential distribution with parameter Ui+X, Nq 
has distribution pj, and has conditional distribution 

P(6 = (n,^i)|iVo,fo) = -pi-i{e, < No,i,} 

{No, h) 

for aU (ii,£i) eXi. 
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• For any n > 1, conditionally on F„_i = a{Tk, Nk, ^fe+i : k < n}, Tn,Nn are 
independent and follow an exponential law with parameter acr({„)+A resp. the 
law P(j(j^-), and 

{Nn,h) 

for all {in+i,(-n+i) € 5 X N, i.e., ^„+i is a child of ^„ selected randomly with 
weight proportional to /i(7(4„+i)- 

Define X = (J^^g X„ C X recursively by Xq = {i} and X„ = X« U with 

<Nn-l,iJ, 

the ofi'spring of and 

Xl = {{x;in,in) € X„ : X e Xn-l \ {^n-ljjn < N^^iJ 

the offspring of all other individuals in Xn-i- (Note that in the last display there is no 
hat on N; that is, the bushes have unbiased offspring.) The split times arc given 
by = To, Tj„ = T^„_i + f„ for n > 1, and = + ii x G X \ {^n ■ n > 0}. 
(Again, in the latter case there is no hat on r, meaning that the individuals off the 
trunk have unbiased life times.) The total population at time t is then given by 

X{t) = {xGX : fi<t<f^}. 

The selected trunk individual at time t is ^(t) = ^„ if T^„_i < t < T^^, and the process 
{X{t),^{t))^^^ in := D([0,oo[,<P/(X) x X) = O x I?([0,(X)[,X) describes the size- 
biased tree with trunk (C(t))t>o- -^^ have emphasized above, the type process along 
the tnmk, a{t) := o-(^(f)), is a copy of the retrospective mutation chain as defined in 
Section 3. In contrast, the individuals off the trunk may be understood as a branching 
process with immigration. 

We write for the distribution of (X(t), ^(f))^^^ on fi*, and P' for its marginal, 

the distribution of (^X{t))_^^^ on il. The representation theorem below establishes 

the relationship between P% P^ and the retrospective mutation chain. We use the 
shorthand y[0,t] for a path (2/(s))q<^<c 

Theorem 4.1. Let t > 0, i € S, and F : D{[0,t],^ f{X) x X) ^ [0,oo[ be any 
measurable function. Then one has 

hr^wfe-^* J2 F{X[0,t],x[0,t])h,^,)] =Ei(^F{X[0,t],^[0,t])) . (4.2) 

Recall that this theorem is valid for arbitrary sign of A. The proof is postponed until 
Section 5.1. Here we discuss some immediate consequences and possible extensions. 
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Remark 4.1. (a) Setting F{X[0,t\,x[0,t\) = l{a{x{t)) = j} hj^ in (4.2) and using 
the ergodic theorem for the retrospective mutation chain a[^{t)) we obtain the Perron- 
Frobenius result 

r {Zj {t)) e-^* = hi n = j) hf ^ hi aj hf = hi Wj . (4.3) 

In particular, Eq. (2.5) follows by summing over j. 

(b) Taking any F of the form F{X[0, t],x[0, t\) = g{X[0, t\) we conclude that 

h-' W (W{t) g{X[0, t])) = W (^g{X[0, t])) 

with 

W{t) := {Z{t),h) e"^*. 

In particular, hi = E*(W(t)). Thus, on the tr-algebra J^t generated by X[0,f], P* is 
absolutely continuous with respect to P* with density W{t)/hi, and {W{t))t>o is a 
martingale with respect to P*. The latter statement is one of the standard facts of 
branching process theory; see e.g. [2], p. 209, Theorem 1. 

(c) Theorem (4.1) has the appearance of the Campbell theorem of point process 
theory; see, e.g., [21], pp. 14 & 228. To clarify the relation let f > be fixed and 

^t) = {x[0,t]:xeX{t)} 

the finite random subset of D([0, t],X) which describes the lineages that survive until 

time t. Also, let be the measure on «P/(D([0,i],X)) x i?([0,t],X) with Radon- 
Nikodym density e'^* hi ^"(^^(j-)-) relative to the joint distribution of $(t) = {x[0,t] : x G 
X{t)} and ^[0,t] under Pj.. Theorem (4.1) then implies that 

E'( J2 F{m,i'))= [ F{^,i,)Ci{d^,d^) 

for any measurable F > 0, i.e., CI is the Campbell measure of $(i) under P*. This 
assertion, however, is slightly weaker than Theorem (4.1) because X[0,i] also includes 
the lineages that die out before time t. 

Remark 4.2. In the above, the size-biased tree was constructed using the right eigen- 
vector /i as a weight on the types. As a matter of fact, the same construction can 
be carried out when h is replaced by an arbitrary weight vector 7 G ]0,oo['^, and a 
representation theorem analogous to Theorem (4.1) can be obtained. We discuss here 
only the special case 7^1 which is of particular interest, and already appears in [9, 
Theorem 2] in the context of critical multitype branching. The size-biased offspring 
distribution associated with this case is 

Pi{K) = \\K\\pi{K)/mi, KGZ^, 

where ||«;|| = Hj is the total offspring and rrii = expectation imder pj. 

The lifetime of an z- individual on the trunk is exponential with parameter aitrii, and 
the successor on the trunk is chosen among the children with equal probability. Writing 
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a tilde (instead of a hat) to characterize all quantities of the associated size-biased tree, 
one arrives at the following counterpart of (4.2): 

E e-*^''''*^'''^^^(^[0,<],a;[0,f])] =r,(F(X[0,t],|[0,t])) . (4.4) 

In the above, r is the vector with i-coordinate Vi = aiirm — 1) = a^, the mean 
reproduction rate of type i. Accordingly, the expectation {L'^{t),r) is the mean repro- 
duction rate along the lineage leading to x at time t. The type process along the trunk, 
a{t) := a[^{t)) , is the Markov chain with transition rates gij = at niij —nii 6ij. In view 
of the decomposition = gij + ViSij, this Markov chain describes the pure mutation 
part of the type evolution. 

On the left-hand side of (4.4), each individual is weighted according to the mean 
fertility of its lineage. Indeed, suppose we are given a lineage up to time t of which we 
know only the intervals of time spent in each state i € S, and imagine that random 
split events and independent random offspring sizes are distributed over [0,t] with 
the appropriate rates and distributions. The number Q of split events during the 
sojourn in state i is then Poisson with parameter OitVi, where Ui is the fraction of time 
spent in state i; and the expected total offspring at each of these events is TOj. Since 
offspring sizes are independent, the expected product of offspring sizes along the lineage 
then amounts to Hiesl^l^i') = e*^'''''^ A result similar to (4.4), with an analogous 
interpretation of the exponential factor, already appears in [3, p. 127] in the context 
of Palm trees for spatially inhomogeneous branching. 

Here are some consequences of (4.4): 

(a) For F{X[0,t],x[0,t]) = oxp (L^(t),r)] I{a{x{t)) = j}, Eq. (4.4) becomes 

r (^Zj{t)) = Ei (e* <^'W''-> iWm) = j}) , (4.5) 

which is a version of the Feynman-Kac formula. Indeed, consider the function u{t, i) = 
E,^{Zj{t)) for fixed j. Since u{t,i) = (e**)ij, it follows that u{t,i) is the unique solution 
of the Cauchy problem 

^ u{t, «) = E 9ik u{t, k) + Ti u{t, i), u{0, i) = Sij , 
kes 

which is given by the Feynman-Kac formula. 

(b) Summing over j in (4.5) and using Varadhan's lemma of large deviation theory 
(see [12, p. 32] or [23, Theorem 2.1]) together with (2.5) we arrive at the variational 
principle 

A = ^lim llogr(|X(t)|) =_^max^ [{u, r) - , 

where is the large deviation rate function for the empirical distribution of the Markov 
chain with transition rates gij; cf. (5.10) for its definition in the case of the transition 

rates gij. In fact, it is not difficult to sec that the maximum is attained at (and only 
at) the ancestral distribution a. This variational principle is behind the one found in 
[11]. 
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(c) Just as in Remark 4.1(b) we find that 

W{t):= J2 e-*<^^«''-> 
xex{t) 

is a martingale. In this martingale (which does not seem to have been considered so 
far), each individual at time t is weighted according to the mean fertility of its lineage. 

5. Proofs 

5.1. Transforming the tree 

Here we prove Theorems 4.1 and 2.1(b). For the former we do not need that A is 

positive. 

Proof of Theorem 4-1- It is sufficient to show that 

i: (f{X[0, t],aO, t]) ; m =x)= e-^* r (f{X[0, t],x[0, t]);xe xit)) 

(5.1) 

for all X € X; the theorem then follows by summation over all x e X. Suppose that 
X = (I'l, . . . ,in;ii, . . . ,£n) € X„, and let Xk = {ii, . . . ,ik;ii, ■ ■ ■ ,ik) be its ancestor in 

generation k, < k < n. 

J _:J- _r /r i\ _.-J .-At L-l L 

7(X) 



Consider the right-hand side of (5.1) and write e h^^ h^i^) = 9i 92 with 



n — l n — 1 , . , n-1 , 

Qi-e lie,,, 92-11^-^, Qs-[[j^^, 

k=0 fe=0 fe=0 \ ^fe' / 

of course, the random quantities q2 and qz must then be included into the expectation. 

The factor qi corresponds to the time change obtained when the exponential parameter 
ttj^ is replaced by a^^+A = a^^Cj^ along the ancestral line of x, i.e., when Tx^. is replaced 
by f fc for A; = 0, ... , n. Indeed, the associated Radon-Nikodym density is 



qi = e-^^- Yl Ci_ 



fe=o 



Conditioning qi on the tree X[0, t] up to time t and using the loss of memory property 
of the exponential law of we find that, almost surely on {T^ < t}, 

n 

E'(gi|X[0,i]) =e-^*E'(e-^(^«-*)|T, >i) [] c,, = . 

fe=o 

Next, it is immediate from (4.1) that the factor q2 is precisely the Radon-Nikodym 
density corresponding to a change from A^^^^ to the size-biased offspring IVk for k = 
0, . . . , n — 1. Finally, q^ is the conditional selection probability for the trunk: 

qs = r (Cfc+i = Xk+i for < fc < n I X[0, t] ) . 

The right-hand side of (5.1) is therefore equal to 

it (^F{X[0, t],x[0, t]);Ti<t< fx, Cfe+i = Xk+i for < fc < n) 

= Et(F(X[0,i],^[0,i]);C(t)=a;), 
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as was to be shown. 

In the rest of this paper we assume that A > 0. 

Proof of Theorem 2.1(b). The basic observation is that the martingale W{t) = 
{Z{t), h) e~^* considered in Remark 4.1(b) converges to a finite limiting variable W >Q 
P*-almost surely for each i. When combined with Theorem 2.1(a) to be proved below, 
this implies the asserted convergence result. The essential part of the proof consists 
in showing that W is nontrivial if and only if condition (2.4) holds. There are two 
possible routes to achieve this. 

Either one can consider a discrete time skeleton i5N and simply apply the discrete- 
time version of the Kcsten-Stigum theorem. For this one has to check that condition 
(2.4) holds if and only if IE*(Zj((5) log Zj{6)) < oo for all i,j £ S, which can be done. 

Or, more naturally, one can use Theorem 4.1 to extend the conceptual proof of Lyons 
et al. [20] and Kurtz ct al. [19] directly to continuous time. Wc spell out some details 
for the convenience of the reader. As in [20], one observes first that W is nontrivial 
if and only if P' is absolutely continuous with respect to P' (with Radon-Nikodym 
density W/hi), which is the case if ond only if 

limsup W^(i) < 00 P'-almost surely; (5.2) 

here wc have put a hat on W to stress the change of the underlying measure. 

To check that (5.2) is equivalent to (2.4) one notices first that (2.4) is equivalent to 

E( log{Ni, h)) < 00 for all i e S, (5.3) 

by the properties of log and Eq. (4.1). Next one observes that X{t) \ {^{t)} is a 
branching process with immigration at the split times of the trunk ^{t). Specifically, 
let T(„) := be the n-th split time and -/V(n) = N^^ the n-th offspring of the trunk. 
The iV(n) s-re independent (conditionally on the trunk), with distribution p^(-j^). 

Suppose first (5.3) fails, and pick any j G S with E(^\og{Nj , h)) = oo. Consider 
the subsequence {T(^ni))i>i of split times of the trunk for which cr(^„,) = j. Since the 
random variables log(iV(„j'), /i) are i.i.d. with infinite mean, a standard Borcl-Cantelli 
argument shows that \imsu'Pi_^^l~^\og{N(ni),h) = oo almost surely. On the other 

hand, limsupj^g^ T^ni)/^ < oo a.s. because the differences T(^rn^^) —T(^ni) are i.i.d. with 
finite mean. This gives 

limsup VF(T(„j)) > limsup(iV(„,), h) e~'^"^("i> = oo a.s., 

l—^oo l—^oo 

so that (5.2) fails. 

Conversely, suppose (5.3) holds. As in Section 4, we consider the offspring xf^_^_i of 
the trunk created at time T(„-) having type counting measure A''(„-) . We also introduce 

the cr-algebra T generated by the trunk variables {?"(„), iV(„) : n > 0} and use a tilde to 
characterize the trunk-reduced quantities obtained hy removing the trunk individuals 
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from the population. Then for each t > we obtain, with the notation (2.2), 



El{W{t)\T)= e-^^WE:( J2 {Z{x,t),h)e-^^'-^(-^^ 



r 



= e-^^w(Ar(„),/i) 

by the martingale property of W{t) applied to the descendant trees X(x, ■). Now, (5.3) 
and a Borcl-Cantclli argument imply that n^^ log(iV(„), /i) ^ almost surely. On the 
other hand, liminf„^oo T(^n)/n > by the law of large numbers, whence 

Y e"^'^(") (7V(„) ,h) <oo a.s. 

n>0 

This means that, conditionally on T, W{t) is a submartingale with bounded expec- 
tation, which gives (5.2) by the submartingale convergence theorem and finishes the 
proof of Theorem 2.1(b). The final identity {W > 0} = figurv a.s. follows from the 
trivial inclusion {W > 0} C figurv and the well-known fact that Qi = F^{W = 0) solves 
the equation qi —^{Ylji^s Qj''^ ) which has the extinction probabilities as unique non- 
trivial solution [2, p. 205, Eq. (25)]. 

5.2. Laws of large numbers for population averages 

In this section we are concerned with laws of large numbers for population averages. 
We state a general such law for discrete time skeletons and then use it to prove 
Theorems 2.1(a) and 3.1. Recall from (2.1) that, for t,u > and x G X{t), the 
path X{x, [t,t+u]) = (^X {x,t+s)) Q^^^^ describes the subtree of ^-descendants during 

the time interval 

Proposition 5.1. Let 5,u > 0, i,j S S, and f : I?([0, m], CP(X)) — > M 6e a measurable 
function with existing mean cj = E-' (/ o X[0, u]) . Then 

lim f o X(x,\n5,nS + u]) Cj -almost surely on ^smv 

Proof. This result follows essentially from Lemmas 3 and 4 in [19]. Since this 
reference contains no proof of the former, we provide a proof here for the sake of 
completeness. 

We assume first that S is so large that u < S and p := E^{Zj{S)) > 1. Such a 6 
exists because A > and A is irreducible. Let TnS denote the cr-algebra generated by 
X[0, nS]. Since u < 5, for each n > 1 the random variables ipn,x '■= f°X(x, [n5, n5 + u\) 
with X e Xj{n5) are J^(„+i)5-measurable and, conditionally on J^^Si i-i-d. with mean Cj. 
This implies that the sequence (</3/)/>i on f^surv obtained by enumerating first {</9i,x • 
X G Xj{5)} in some order, then {if2,x ■ x G Xj{2S)} and so on, is still i.i.d. with mean 
Cj. The strong law of large numbers therefore implies that limfe^oo(l/fc) S(Li — 
P*-almost surely on Osurv, and thus in particular that the subsequence 

1 " 

" 1=1 xeXj{l5) 
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converges to Cj P'-almost surely on Osurv as n ^ oo; here = X^JLi V"; with i/'i = 
Z,{13). 

Next, the sequence (V'()i>i dominates a single-type discrete-time Galton- Watson 
process with mean p> I, and the latter survives precisely on Ogurv By Lemma 4 of 
[19], it follows that liminf^^oo ^ P almost surely on Ogurv This implies that 



limsup*„_i/^„ = limsup'S^tpi/i'n < oo 
almost surely on fisurv As 

^ X! fn,x = An + {An-An-l)'bn-l/i>n, 

x^Xj (nS) 

the proposition follows in the case of large S. 

If (5 > is arbitrary, we choose some fc e N such that 6' := k5 is so large as 
required above. Let < / < fc. Applying the preceding result to each of the subtrees 
X{x, [IS, oo[) with X e X{IS) and averaging, we then find that 

1 X] 'Pnk+l,x = Cj 

P'-almost surely on figurv, and the proof is complete. 



A typical application of the preceding proposition is the following corollary. Consider 
the Xj (s)-averaged type counting measure 



C.>(s) = ^ E Z{x,s+u) (5.4) 



at time s+u, where Z{x,s+u) is defined by (2.2). Proposition 5.1 then immediately 
implies the following corollary. 

Corollary 5.1. For any 5,u> Q and i,j € S, 

Cju{n5) — > W{Z{u)) -almost surely on Qsuiv 

To pass from a discrete time skeleton to continuous time we will use the following 
continuity lemma which follows also from Proposition 5.1. 

Lemma 5.1. Given s > 0, there exists some 5 > such that for all i,jGS and G N 
one has 

limsup sup < 1 + g, (5.5) 

n^oo nS<s<{n+l)S II j II 

Z (s) 

liminf inf ' >l-s, (5.6) 

n->oo nS<s<{n+l)S Zj{no) 
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and 

E \\z{y,s+u)\\ 

liminf inf inf >l-e (5.7) 

n^oo nS<s<{n+l)S k5<u<{k+l)6 ^ \\Z{y,n5+kd)\\ 

yeXj{nS) 

F'^- almost surely on Ogurv 

Proof. We begin by proving the upper bound (5.5). For nS < s < {n+l)5 we can 
write 

\\Z{s)\\= E l^(^>^)l< E M{x,[n6,{n+m), 

xeX{nS) xeX{nS) 

where M{x, [nS, {n+l)S]) = maxns<s<{n+i)s \^{x,s)\. Hence 

sup iS^<max-l— V M{x,[nS,in+m) . 
n5<s<{n+i)5\\Zin5)\\ jes Zj{nS) ^^^^^^^ 

By Proposition 5.1, the last expression converges to m{S) := maxj-gg (M(0, [0,(5])) 
almost surely on fisurv Now, M(0, [0,6]) is dominated by the total size at time 6 of 
the modified branching process for which the random variables N^^^^^-^ in Section 2 
are replaced by A^x,cr(x) V 1, so that each individual has at least one offspring of its 
own type. The latter process has a finite generator matrix, say A'^. Hence m{5) < 
maxj(e^'^^l)j — > 1 as 5 — > 0. This completes the proof of (5.5). 

Next wc note that (5.6) follows from (5.7) by setting u = k = 0. So it only remains 
to prove (5.7). Let n6 < s < {n+l)6 and k6 < u < {k+l)6. Considering only those 
individuals y G X{s) already alive at time nS and still alive at time {n+l)6, and only 
those descendants z e X{y,s+u) living during the whole period [{n+k)S, {n+k+2)6], 
we obtain the estimate 

E \\Ziy,S+u)\\> E Hrx,n6>S} E ^{^.,(n+fc)5 > 2^} . 

V&X.j(s) x&Xj{nS) z&X{x,{n+k)S) 

Here we write Tx^t = inf{u > : a{x) ^ X(t+u)} = — t for the remaining life time 
of a; e X{t) after time t. Proposition 5.1 therefore implies that the left-hand side of 
(5.7) is at least 

w(^I{t,,o>S} E I{rz,kS>^S})/^'{\^ikd)\) (5.8) 

zeX{k5) 

P'-almost surely on flsurv By the Markov property, the numerator is equal to 
W (/{r,- > -^l E exp[-25a^(^)]) > e'^*" E^' {I{tj,o > 5} \X{k6)\) 

zeX{k5) 

with a = maxj Oj. The ratio in (5.8) is therefore not smaller than e"^''" (1 — ek), where 
ek = W (l{Tj,o < S} \Xik5)\) /w{\X{k5)\) . 
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For A; = we have eq = 1 — e . For A; > 1 we can use Theorem 4.1 to obtain 
= ii(/{r,,o < S} h-l,,,s)))/n{h-ms),) < ^ (1 - e-(^+^)) . 
Hence, if 6 is sufficiently small then the ratio in (5.8) is larger than 1 — s. 

Wc arc now ready for the proofs of Theorem 2.1(a) and 3.1. 

Proof of Theorem 2.1(a). Essentially we reproduce here the argument of [19]. Let 
£ > be given and e' > be such that, for every v e 'P{S), Hi/ — 7r|| < s whenever 
\\av — 7r[| < e' for some a > 0. Let (5 > be so small as required in Lemma 5.1. 
According to (4.3), we can choose some u G (5N so large that 

\\W{Z{u)e-^'') -hj'nW < e' mmhi 

for all j e S. Corollary 5.1 then implies that, P'-almost surely on Ogurvj 

||Cj,„(s) e""^" — /ij 7r|| < s' min hi 



ies 



for all sufficiently large ,s € 6N. Writing Il{t) = Z{t)/\\Z{t)\\ and a{t) = for 
t > w, we conclude that 

||a(i)n(i)-7r|| < ^^^J^^^^^ Y.Zj{t-u) \\Cj,u{t-u)e-^^-hj4<e' 

and therefore || !!(/:) — 7r|| < e for all sufficiently large i £ (5N a.s. on risurv Finally, 
using (5.5) and (5.6) we find that Uj (t) > (1 — 2e)TTj — e for all j & S and all sufficiently 
large real t, 3jg3,iii &.S. on f^gm-y. Since £ was arbitrary and n(t),7r G ^{S), this gives 
the desired convergence result. 

Proof of Theorem 3.1. Recall the definition (3.1) of A"(t) £ V{S), the X(i)-average 
of the ancestral type distribution at time t—u, and let a" G ^('S') be given by its 
coordinates a" = ttj E-'(||Z(u)||) e""^". Since a" — > a as u ^ oo by (2.5), it is sufficient 
to show that 

pfVw > : A"(t) — > a" n^\=l. (5.9) 

Fix any j & S, u> and (5 > 0. By Corollary 5.1, 

||Cj-„(s)|| ^ E^(||Z(u)||) as s ^ 00 through 

P*-almost surely on Jlsuiv Combining this with Remark 3.1 and Theorem 2.1(a) we 
obtain, writing again li.j{s) := Zj{s) /\\Z{s)\\, 



Zj{s)\\CjAs)\\ _ n,{s)\\CM\ 
^^•^^^"^ \X{s + u)\ Efeesnfe(.s)||Cfe,„(s) 



A^As + u) = 



-almost surely on figurv 
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Next let £ > be given and ^ > be chosen according to Lemma 5.1. Applying the 
above to u = k6 with arbitrary A; e N and using (5.5) and (5.7) we find that 

r (Vm > Vj e 5 : lim inf A^{t) > (1 - 2e)a] fl^^ry) = 1 , 

where the w-uniformity in (5.7) allows us to bring the M-quantifier inside of the proba- 
bility. This gives (5.9) because s is arbitrary and A'"{t) and a" are probability measures 

on S. 

5.3. Application of large deviation theory 

In this section we prove Theorems 3.2 and 3.3. The main tools are the representation 
theorem 4.1 and the Donsker-Varadhan large deviation principle for the empirical 
process of the retrospective mutation chain. In fact, these two ingredients together 
imply a large deviation principle for the type histories as follows. For every v G Pe{^) 
let 

Hq{u) =supi?(i^[o,t];/^[o,t])A 

be the process-level large deviation rate function for the retrospective mutation chain. 
In the above, I'lct] s^nd /xjq fj are the restrictions of u and fi to the time interval 
[0,t], and if(t'[o,t]; jU[o,t]) is their relative entropy. See [4, Eq. (4.4.28)]; alternative 
expressions can be found in [4, Theorem 4.4.38] and [23, Theorems 7.3 and 7.4]. 

Theorem 5.1. For the empirical type evolution process R^{t) as in (3.6) we have, for 
i G S and closed F c Ve{T.) 

limsup - logr f y /{ii^(t) eF}) <X - inf HgM , 

t^oo t \ ^ ) v(kF 

a:eA (t) 

while for open G C 'Pe(S) 

lim inf - \ogW ( V /{i?^(t) e Gl) > A - inf Hq{u) . 
xex{t) 

Moreover, the function Hq is lower semicontinuous with compact level sets and attains 
its minimum precisely at /lx. 

Proof. In view of Theorem 4.1, for every measurable C C ■Pe(S) we have 

r( ^ i{R^{t)eC})=hie^'K{i{RHt)€C}h-l^^^^^). 

xex{t) 

Since maxi|log/ii| < cxd, the /I's can be ignored on the exponential scale. The 
theorem thus follows from the Donsker-Varadhan large deviation principle; see [23, 
p.37. Theorem 7.8] or [4, Theorem 4.4.27], for example. 

There is a similar large deviation principle on the level of empirical distributions. 
For pGViS) let 

/g(i^)= sup \-yui{Gv)i/vi\ = inf Hg{i^) (5.10) 
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be the level-two rate function of the retrospective mutation chain; here we write i^o 
for the time-zero marginal distribution of u. (For the second identity see [23, p. 37, 
Theorem 7.9].) Then the following statement holds. 

Corollary 5.2. For any i £ S and dosed F C 'P{S), 

limsup - logE^f y /{i'^(t) &F}]<\ - inf Iq{u) , 

xex{t) 

while for open G C 'P{S) 

liminf - logE'( HL'^it) G G}) > A - inf Iciu) . 

Moreover, the function Iq is continuous and strictly convex and attains its minimum 
precisely at a. 

Proof. Simply replace the process-level large deviation principle for the retrospective 
mutation chain by the one for its empirical distributions. The latter can cither be 
deduced from the former by the contraction principle, see [23, Theorems 2.3 & 7.9], or 
be proved directly as in [12, Section IV. 4]. 



We are now ready for the proofs of Theorems 3.2 and 3.3. 

Proof of Theorem 3.3. Let be a metric for the weak topology on Vb{T?). To be 
specific, we let denote the Skorohod metric on S (defined in analogy to the one- 
sided case considered in [8, p. 117, Eq. (5.2)]), and d be the associated Prohorov 
metric on Vq{T?)] see [8, p. 96, Eq. (1.1)]. For any fixed e > we consider the set 
C = {i/' G Pe(S) : d{u,iji) > e}, the complement of the open e-neigborhood of /it. In 
view of Remark 3.3 we need to show that 

' ^ ^' xex(t) 

P'-almost surely on Ogurv In the first part of the proof we will establish this conver- 
gence along a discrete time skeleton where 5 > is arbitrary. 

Since C is closed and Hq has compact level sets and attains its minimum at 
only, the infimum c := miueC Hq{i>) is strictly positive. We can therefore choose a 
constant A > 7 > A — c. We write 



xex{t) 



and show that each factor tends to along a.s. on fisurv In view of Corollary 5.1 

and Theorem 2.1(a), 

— ^ V7r, E^ (|X((5)|) =6^'' a.s. on f^surv 
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Hence n ^log\X{nS)\ — > XS and therefore e''"^ /\X{n6)\ — > a.s. on Ogurv On the 
other hand, using Markov's inequaUty and Theorem 5.1 we obtain for any a > 

hmsup^: logPM e-'^"'' V /{i?^(n(5) e C} > a ) < A - c - 7 < . 

^ xeX{n6) ' 

The Borel-Cantelh lemma thus shows that also the second factor of V{t, C) tends to 
a.s. as t ^ 00 through i5N. We therefore conclude that lim„_>oo r(ni5, C) = a.s. on 

^surv * 

To extend this result to the full convergence t ^ 00 along all reals we pick some 

< e' < e and let C be defined in terms of e' instead of e. Also, let A be an arbitrary 
closed set in E, e* = e — e' , and ^* = {cr G S : ^^(o', ^) < £*} the £*-augmentation of 
A. Then for any two time instants s, t with s <t < s + 5 and every y G X{t) we can 
write 

R''{t){A)< If lA{'&uCr{y)t.pcr)du+^ 
t Jo r 

< Ry^'\s){A*) + - f I{u: dE(^„a(y(s))«,per, ^u(T{y)t,per) > S*} du + ^- . 

S Jo t 

By the locality of the Skorohod metric there exists a constant c = c{e*) such that 
c^sC^uf (y(s))s,per5 ^?«o'(2/)t,per) < £* whenever the interval [— u, s— u] on which these 
functions agree contains [— c, c]. The second term in the last sum is therefore at most 
2c/ s, whence 

W{t){A)<Ry^'\s){A*)+e* 
for sufficiently large s. This means that d{Ry(t),Ry^''\s)) < e* and therefore 

{Ry{t) € C} C {Ry^'\s) e C'} 
when s is large enough. For such s we obtain 

r(t,c)-r(.,c')<(^-^)|x(t)| 

+ wnvi E i{R%s)GC'}{\x{x,t)\-i) 

<1- inf \X{t)\/\Xis)\ 

s<t<s-\-d 

+ TY(7)l E {M{x,[s,s + S])-1), 

where M{x, [s,s+5]) = maxs<t<s+S l^i^jt)] as in the proof of Lemma 5.1. Setting 
s = n5, letting n — >■ 00 and using Theorem 2.1(a) and Proposition 5.1 we see that the 
last term converges to E"(M(0, [0,(5])-1) a.s. onOsurv According to the proof of (5.5), 
this limit can be made arbitrarily small if S is chosen small enough. In combination 
with (5.6) and the first part of this proof, this shows that limsupj^oo ^(i, C) < a for 
every a > almost surely on Ogurv The proof is thus complete. 



22 



H.-O. GEORGII AND E. BAAKE 



Proof of Theorem 3.2. There are two possible routes for the proof. One can either 

repeat the argument above by simply replacing Theorem 5.1 by Corollary 5.2. Or one 
notices that L^{t) is the time-zero marginal of R^{t) and that the marginal mapping 
— > i/Q is continuous in the topologies chosen. The latter fact is used for the derivation 
of the level-two large deviation principle from that on the process level by means of 
the contraction principle; see [23, p. 34]. 
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